Course : CSC204

CSC204, Linux Basic

This is a linux command line reference for common operations.
http://www.pixelbeat.org/cmdline.html

----------------

This is a linux command line reference for common operations.
Examples marked with • are valid/safe to paste without modification into a terminal, so
you may want to keep a terminal window open while reading this so you can cut & paste.
All these commands have been tested both on Fedora and Ubuntu.
See also more linux commands.

Command		Description
•	apropos whatis	Show commands pertinent to string. See also threadsafe
•	man -t ascii \| ps2pdf - > ascii.pdf	make a pdf of a manual page
	which command	Show full path name of command
	time command	See how long a command takes
•	time cat	Start stopwatch. Ctrl-d to stop. See also sw
dir navigation
•	cd -	Go to previous directory
•	cd	Go to $HOME directory
	(cd dir && command)	Go to dir, execute command and return to current dir
•	pushd .	Put current dir on stack so you can popd back to it
file searching
•	alias l='ls -l --color=auto'	quick dir listing
•	ls -lrt	List files by date. See also newest and find_mm_yyyy
•	ls /usr/bin \| pr -T9 -W$COLUMNS	Print in 9 columns to width of terminal
	find -name '*.[ch]' \| xargs grep -E 'expr'	Search 'expr' in this dir and below. See also findrepo
	find -type f -print0 \| xargs -r0 grep -F 'example'	Search all regular files for 'example' in this dir and below
	find -maxdepth 1 -type f \| xargs grep -F 'example'	Search all regular files for 'example' in this dir
	find -maxdepth 1 -type d \| while read dir; do echo $dir; echo cmd2; done	Process each item with multiple commands (in while loop)
•	find -type f ! -perm -444	Find files not readable by all (useful for web site)
•	find -type d ! -perm -111	Find dirs not accessible by all (useful for web site)
•	locate -r 'file[^/]*\.txt'	Search cached index for names. This re is like glob file.txt
•	look reference	Quickly search (sorted) dictionary for prefix
•	grep --color reference /usr/share/dict/words	Highlight occurances of regular expression in dictionary
archives and compression
	gpg -c file	Encrypt file
	gpg file.gpg	Decrypt file
	tar -c dir/ \| bzip2 > dir.tar.bz2	Make compressed archive of dir/
	bzip2 -dc dir.tar.bz2 \| tar -x	Extract archive (use gzip instead of bzip2 for tar.gz files)
	tar -c dir/ \| gzip \| gpg -c \| ssh user@remote 'dd of=dir.tar.gz.gpg'	Make encrypted archive of dir/ on remote machine
	find dir/ -name '*.txt' \| tar -c --files-from=- \| bzip2 > dir_txt.tar.bz2	Make archive of subset of dir/ and below
	find dir/ -name '*.txt' \| xargs cp -a --target-directory=dir_txt/ --parents	Make copy of subset of dir/ and below
	( tar -c /dir/to/copy ) \| ( cd /where/to/ && tar -x -p )	Copy (with permissions) copy/ dir to /where/to/ dir
	( cd /dir/to/copy && tar -c . ) \| ( cd /where/to/ && tar -x -p )	Copy (with permissions) contents of copy/ dir to /where/to/
	( tar -c /dir/to/copy ) \| ssh -C user@remote 'cd /where/to/ && tar -x -p'	Copy (with permissions) copy/ dir to remote:/where/to/ dir
	dd bs=1M if=/dev/sda \| gzip \| ssh user@remote 'dd of=sda.gz'	Backup harddisk to remote machine
rsync (Network efficient file copier: Use the --dry-run option for testing)
	rsync -P rsync://rsync.server.com/path/to/file file	Only get diffs. Do multiple times for troublesome downloads
	rsync --bwlimit=1000 fromfile tofile	Locally copy with rate limit. It's like nice for I/O
	rsync -az -e ssh --delete ~/public_html/ remote.com:'~/public_html'	Mirror web site (using compression and encryption)
	rsync -auz -e ssh remote:/dir/ . && rsync -auz -e ssh . remote:/dir/	Synchronize current directory with remote one
ssh (Secure SHell)
	ssh $USER@$HOST command	Run command on $HOST as $USER (default command=shell)
•	ssh -f -Y $USER@$HOSTNAME xeyes	Run GUI command on $HOSTNAME as $USER
	scp -p -r $USER@$HOST: file dir/	Copy with permissions to $USER's home directory on $HOST
	scp -c arcfour $USER@$LANHOST: bigfile	Use faster crypto for local LAN. This might saturate GigE
	ssh -g -L 8080:localhost:80 root@$HOST	Forward connections to $HOSTNAME:8080 out to $HOST:80
	ssh -R 1434:imap:143 root@$HOST	Forward connections from $HOST:1434 in to imap:143
	ssh-copy-id $USER@$HOST	Install public key for $USER@$HOST for password-less log in
wget (multi purpose download tool)
•	(cd dir/ && wget -nd -pHEKk http://www.pixelbeat.org/cmdline.html)	Store local browsable version of a page to the current dir
	wget -c http://www.example.com/large.file	Continue downloading a partially downloaded file
	wget -r -nd -np -l1 -A '*.jpg' http://www.example.com/dir/	Download a set of files to the current directory
	wget ftp://remote/file[1-9].iso/	FTP supports globbing directly
•	wget -q -O- http://www.pixelbeat.org/timeline.html \| grep 'a href' \| head	Process output directly
	echo 'wget url' \| at 01:00	Download url at 1AM to current dir
	wget --limit-rate=20k url	Do a low priority download (limit to 20KB/s in this case)
	wget -nv --spider --force-html -i bookmarks.html	Check links in a file
	wget --mirror http://www.example.com/	Efficiently update a local copy of a site (handy from cron)
networking (Note ifconfig, route, mii-tool, nslookup commands are obsolete)
	ethtool eth0	Show status of ethernet interface eth0
	ethtool --change eth0 autoneg off speed 100 duplex full	Manually set ethernet interface speed
	iwconfig eth1	Show status of wireless interface eth1
	iwconfig eth1 rate 1Mb/s fixed	Manually set wireless interface speed
•	iwlist scan	List wireless networks in range
•	ip link show	List network interfaces
	ip link set dev eth0 name wan	Rename interface eth0 to wan
	ip link set dev eth0 up	Bring interface eth0 up (or down)
•	ip addr show	List addresses for interfaces
	ip addr add 1.2.3.4/24 brd + dev eth0	Add (or del) ip and mask (255.255.255.0)
•	ip route show	List routing table
	ip route add default via 1.2.3.254	Set default gateway to 1.2.3.254
•	host pixelbeat.org	Lookup DNS ip address for name or vice versa
•	hostname -i	Lookup local ip address (equivalent to host `hostname`)
•	whois pixelbeat.org	Lookup whois info for hostname or ip address
•	netstat -tupl	List internet services on a system
•	netstat -tup	List active connections to/from system
windows networking (Note samba is the package that provides all this windows specific networking support)
•	smbtree	Find windows machines. See also findsmb
	nmblookup -A 1.2.3.4	Find the windows (netbios) name associated with ip address
	smbclient -L windows_box	List shares on windows machine or samba server
	mount -t smbfs -o fmask=666,guest //windows_box/share /mnt/share	Mount a windows share
	echo 'message' \| smbclient -M windows_box	Send popup to windows machine (off by default in XP sp2)
text manipulation (Note sed uses stdin and stdout. Newer versions support inplace editing with the -i option)
	sed 's/string1/string2/g'	Replace string1 with string2
	sed 's/$.*$1/\12/g'	Modify anystring1 to anystring2
	sed '/ #/d; /^ $/d'	Remove comments and blank lines
	sed ':a; /\\$/N; s/\\\n//; ta'	Concatenate lines with trailing \
	sed 's/[ \t]*$//'	Remove trailing spaces from lines
	sed 's/$[`"$\]$/\\\1/g'	Escape shell metacharacters active within double quotes
•	seq 10 \| sed "s/^/ /; s/ *$.\{7,\}$/\1/"	Right align numbers
	sed -n '1000{p;q}'	Print 1000th line
	sed -n '10,20p;20q'	Print lines 10 to 20
	sed -n 's/.<\/title>./\1/ip;T;q'	Extract title from HTML web page
	sed -i 42d ~/.ssh/known_hosts	Delete a particular line
	sort -t. -k1,1n -k2,2n -k3,3n -k4,4n	Sort IPV4 ip addresses
•	echo 'Test' \| tr '[:lower:]' '[:upper:]'	Case conversion
•	tr -dc '[:print:]' < /dev/urandom	Filter non printable characters
•	tr -s '[:blank:]' '\t'	cut fields separated by blanks
•	history \| wc -l	Count lines
set operations (Note you can export LANG=C for speed. Also these assume no duplicate lines within a file)
	sort file1 file2 \| uniq	Union of unsorted files
	sort file1 file2 \| uniq -d	Intersection of unsorted files
	sort file1 file1 file2 \| uniq -u	Difference of unsorted files
	sort file1 file2 \| uniq -u	Symmetric Difference of unsorted files
	join -t'\0' -a1 -a2 file1 file2	Union of sorted files
	join -t'\0' file1 file2	Intersection of sorted files
	join -t'\0' -v2 file1 file2	Difference of sorted files
	join -t'\0' -v1 -v2 file1 file2	Symmetric Difference of sorted files
math
•	echo '(1 + sqrt(5))/2' \| bc -l	Quick math (Calculate φ). See also bc
•	echo 'pad=20; min=64; (10010^6)/((pad+min)8)' \| bc	More complex (int) e.g. This shows max FastE packet rate
•	echo 'pad=20; min=64; print (100E6)/((pad+min)*8)' \| python	Python handles scientific notation
•	echo 'pad=20; plot [64:1518] (100106)/((pad+x)8)' \| gnuplot -persist	Plot FastE packet rate vs packet size
•	echo 'obase=16; ibase=10; 64206' \| bc	Base conversion (decimal to hexadecimal)
•	echo $((0x2dec))	Base conversion (hex to dec) ((shell arithmetic expansion))
•	units -t '100m/9.58s' 'miles/hour'	Unit conversion (metric to imperial)
•	units -t '500GB' 'GiB'	Unit conversion (SI to IEC prefixes)
•	units -t '1 googol'	Definition lookup
•	seq 100 \| (tr '\n' +; echo 0) \| bc	Add a column of numbers. See also add and funcpy
calendar
•	cal -3	Display a calendar
•	cal 9 1752	Display a calendar for a particular month year
•	date -d fri	What date is it this friday. See also day
•	[ $(date -d "tomorrow" +%d) = "01" ] \|\| exit	exit a script unless it's the last day of the month
•	date --date='25 Dec' +%A	What day does xmas fall on, this year
•	date --date='@2147483647'	Convert seconds since the epoch (1970-01-01 UTC) to date
•	TZ='America/Los_Angeles' date	What time is it on west coast of US (use tzselect to find TZ)
•	date --date='TZ="America/Los_Angeles" 09:00 next Fri'	What's the local time for 9AM next Friday on west coast US
locales
•	printf "%'d\n" 1234	Print number with thousands grouping appropriate to locale
•	BLOCK_SIZE=\'1 ls -l	Use locale thousands grouping in ls. See also l
•	echo "I live in `locale territory`"	Extract info from locale database
•	LANG=en_IE.utf8 locale int_prefix	Lookup locale info for specific country. See also ccodes
•	locale -kc $(locale \| sed -n 's/$LC_.\{4,\}$=.*/\1/p') \| less	List fields available in locale database
recode (Obsoletes iconv, dos2unix, unix2dos)
•	recode -l \| less	Show available conversions (aliases on each line)
	recode windows-1252.. file_to_change.txt	Windows "ansi" to local charset (auto does CRLF conversion)
	recode utf-8/CRLF.. file_to_change.txt	Windows utf8 to local charset
	recode iso-8859-15..utf8 file_to_change.txt	Latin9 (western europe) to utf8
	recode ../b64 < file.txt > file.b64	Base64 encode
	recode /qp.. < file.qp > file.txt	Quoted printable decode
	recode ..HTML < file.txt > file.html	Text to HTML
•	recode -lf windows-1252 \| grep euro	Lookup table of characters
•	echo -n 0x80 \| recode latin-9/x1..dump	Show what a code represents in latin-9 charmap
•	echo -n 0x20AC \| recode ucs-2/x2..latin-9/x	Show latin-9 encoding
•	echo -n 0x20AC \| recode ucs-2/x2..utf-8/x	Show utf-8 encoding
CDs
	gzip < /dev/cdrom > cdrom.iso.gz	Save copy of data cdrom
	mkisofs -V LABEL -r dir \| gzip > cdrom.iso.gz	Create cdrom image from contents of dir
	mount -o loop cdrom.iso /mnt/dir	Mount the cdrom image at /mnt/dir (read only)
	cdrecord -v dev=/dev/cdrom blank=fast	Clear a CDRW
	gzip -dc cdrom.iso.gz \| cdrecord -v dev=/dev/cdrom -	Burn cdrom image (use dev=ATAPI -scanbus to confirm dev)
	cdparanoia -B	Rip audio tracks from CD to wav files in current dir
	cdrecord -v dev=/dev/cdrom -audio -pad *.wav	Make audio CD from all wavs in current dir (see also cdrdao)
	oggenc --tracknum='track' track.cdda.wav -o 'track.ogg'	Make ogg file from wav file
disk space (See also FSlint)
•	ls -lSr	Show files by size, biggest last
•	du -s * \| sort -k1,1rn \| head	Show top disk users in current dir. See also dutop
•	du -hs /home/* \| sort -k1,1h	Sort paths by easy to interpret disk usage
•	df -h	Show free space on mounted filesystems
•	df -i	Show free inodes on mounted filesystems
•	fdisk -l	Show disks partitions sizes and types (run as root)
•	rpm -q -a --qf '%10{SIZE}\t%{NAME}\n' \| sort -k1,1n	List all packages by installed size (Bytes) on rpm distros
•	dpkg-query -W -f='${Installed-Size;10}\t${Package}\n' \| sort -k1,1n	List all packages by installed size (KBytes) on deb distros
•	dd bs=1 seek=2TB if=/dev/null of=ext3.test	Create a large test file (taking no space). See also truncate
•	> file	truncate data of file or create an empty file
monitoring/debugging
•	tail -f /var/log/messages	Monitor messages in a log file
•	strace -c ls >/dev/null	Summarise/profile system calls made by command
•	strace -f -e open ls >/dev/null	List system calls made by command
•	strace -f -e trace=write -e write=1,2 ls >/dev/null	Monitor what's written to stdout and stderr
•	ltrace -f -e getenv ls >/dev/null	List library calls made by command
•	lsof -p $$	List paths that process id has open
•	lsof ~	List processes that have specified path open
•	tcpdump not port 22	Show network traffic except ssh. See also tcpdump_not_me
•	ps -e -o pid,args --forest	List processes in a hierarchy
•	ps -e -o pcpu,cpu,nice,state,cputime,args --sort pcpu \| sed '/^ 0.0 /d'	List processes by % cpu usage
•	ps -e -orss=,args= \| sort -b -k1,1n \| pr -TW$COLUMNS	List processes by mem (KB) usage. See also ps_mem.py
•	ps -C firefox-bin -L -o pid,tid,pcpu,state	List all threads for a particular process
•	ps -p 1,$$ -o etime=	List elapsed wall time for particular process IDs
•	last reboot	Show system reboot history
•	free -m	Show amount of (remaining) RAM (-m displays in MB)
•	watch -n.1 'cat /proc/interrupts'	Watch changeable data continuously
•	udevadm monitor	Monitor udev events to help configure rules
system information (see also sysinfo) ('#' means root access is required)
•	uname -a	Show kernel version and system architecture
•	head -n1 /etc/issue	Show name and version of distribution
•	cat /proc/partitions	Show all partitions registered on the system
•	grep MemTotal /proc/meminfo	Show RAM total seen by the system
•	grep "model name" /proc/cpuinfo	Show CPU(s) info
•	lspci -tv	Show PCI info
•	lsusb -tv	Show USB info
•	mount \| column -t	List mounted filesystems on the system (and align output)
•	grep -F capacity: /proc/acpi/battery/BAT0/info	Show state of cells in laptop battery
#	dmidecode -q \| less	Display SMBIOS/DMI information
#	smartctl -A /dev/sda \| grep Power_On_Hours	How long has this disk (system) been powered on in total
#	hdparm -i /dev/sda	Show info about disk sda
#	hdparm -tT /dev/sda	Do a read speed test on disk sda
#	badblocks -s /dev/sda	Test for unreadable blocks on disk sda
interactive (see also linux keyboard shortcuts)
•	readline	Line editor used by bash, python, bc, gnuplot, ...
•	screen	Virtual terminals with detach capability, ...
•	mc	Powerful file manager that can browse rpm, tar, ftp, ssh, ...
•	gnuplot	Interactive/scriptable graphing
•	links	Web browser
•	xdg-open .	open a file or url with the registered desktop application

CSC204, Linux Basic

More Linux Commands

http://www.pixelbeat.org/docs/linux_commands.html

My previous reference for practical Linux commands was surprisingly popular
with over 3.5 million hits in nearly 5 years. So I've decided to start compiling
another list of somewhat more involved/esoteric commands.

Examples marked with • are valid/safe to paste without modification into a terminal, so
you may want to keep a terminal window open while reading this so you can cut & paste.

Command		Description
•	grep . /proc/sys/net/ipv4/*	List the contents of flag files
•	set \| grep $USER	Search current environment
•	tr '\0' '\n' < /proc/$$/environ	Display the startup environment for any process
•	echo $PATH \| tr : '\n'	Display the $PATH one per line
•	kill -0 $$ && echo process exists and can accept signals	Check for the existence of a process (pid)
•	find /etc -readable \| xargs less -K -p'*ntp' -j $((${LINES:-25}/2))	Search paths and data with full context. Use n to iterate
Low impact admin
#	apt-get install "package" -o Acquire::http::Dl-Limit=42 \ -o Acquire::Queue-mode=access	Rate limit apt-get to 42KB/s
	echo 'wget url' \| at 01:00	Download url at 1AM to current dir
#	apache2ctl configtest && apache2ctl graceful	Restart apache if config is OK
•	nice openssl speed sha1	Run a low priority command (openssl benchmark)
•	renice 19 -p $$; ionice -c3 -p $$	Make shell (script) low priority. Use for non interactive tasks
Interactive monitoring
•	htop -d 5	Better top (scrollable, tree view, lsof/strace integration, ...)
•	iotop	What's doing I/O
#	watch -d -n30 "nice ps_mem.py \| tail -n $((${LINES:-12}-2))"	What's using RAM
#	iftop	What's using the network. See also iptraf
#	mtr www.pixelbeat.org	ping and traceroute combined
Useful utilities
•	pv < /dev/zero > /dev/null	Progress Viewer for data copying from files and pipes
•	wkhtml2pdf http://.../linux_commands.html linux_commands.pdf	Make a pdf of a web page
•	timeout 1 sleep 3	run a command with bounded time. See also timeout
Networking
•	python -m SimpleHTTPServer	Serve current directory tree at http://$HOSTNAME:8000/
•	openssl s_client -connect www.google.com:443 &0 \| openssl x509 -dates -noout	Display the date range for a site's certs
•	curl -I www.pixelbeat.org	Display the server headers for a web site
#	lsof -i tcp:80	What's using port 80
#	httpd -S	Display a list of apache virtual hosts
•	vim scp://user@remote//path/to/file	Edit remote file using local vim. Good for high latency links
•	curl -s http://www.pixelbeat.org/pixelbeat.asc \| gpg --import	Import a gpg key from the web
•	tc qdisc add dev lo root handle 1:0 netem delay 20msec	Add 20ms latency to loopback device (for testing)
•	tc qdisc del dev lo root	Remove latency added above
Notification
•	echo "DISPLAY=$DISPLAY xmessage cooker" \| at "NOW +30min"	Popup reminder
•	notify-send "subject" "message"	Display a gnome popup notification
	echo "mail -s 'go home' P@draigBrady.com < /dev/null" \| at 17:30	Email reminder
	uuencode file name \| mail -s subject P@draigBrady.com	Send a file via email
	ansi2html.sh \| mail -a "Content-Type: text/html" P@draigBrady.com	Send/Generate HTML email
Better default settings (useful in your .bashrc)
#	tail -s.1 -f /var/log/messages	Display file additions more responsively
•	seq 100 \| tail -n $((${LINES:-12}-2))	Display as many lines as possible without scrolling
#	tcpdump -s0	Capture full network packets
Useful functions/aliases (useful in your .bashrc)
•	md () { mkdir -p "$1" && cd "$1"; }	Change to a new directory
•	strerror() { python -c "import os; print os.strerror($1)"; }	Display the meaning of an errno
•	plot() { { echo 'plot "-"' "$@"; cat; } \| gnuplot -persist; }	Plot stdin. (e.g: • seq 1000 \| sed 's/.*/s(&)/' \| bc -l \| plot)
•	hili() { e="$1"; shift; grep --col=always -Eih "$e\|$" "$@"; }	highlight occurences of expr. (e.g: • env \| hili $USER)
•	alias hd='od -Ax -tx1z -v'	Hexdump. (usage e.g.: • hd /proc/self/cmdline \| less)
•	alias realpath='readlink -f'	Canonicalize path. (usage e.g.: • realpath ~/../$USER)
Multimedia
•	DISPLAY=:0.0 import -window root orig.png	Take a (remote) screenshot
•	convert -filter catrom -resize '600x>' orig.png 600px_wide.png	Shrink to width, computer gen images or screenshots
	mplayer -ao pcm -vo null -vc dummy /tmp/Flash*	Extract audio from flash video to audiodump.wav
	ffmpeg -i filename.avi	Display info about multimedia file
•	ffmpeg -f x11grab -s xga -r 25 -i :0 -sameq demo.mpg	Capture video of an X display
DVD
	for i in $(seq 9); do ffmpeg -i $i.avi -target pal-dvd $i.mpg; done	Convert video to the correct encoding and aspect for DVD
	dvdauthor -odvd -t -v "pal,4:3,720xfull" *.mpg;dvdauthor -odvd -T	Build DVD file system. Use 16:9 for widescreen input
	growisofs -dvd-compat -Z /dev/dvd -dvd-video dvd	Burn DVD file system to disc
Unicode
•	python -c "import unicodedata as u; print u.name(unichr(0x2028))"	Lookup a unicode character
•	uconv -f utf8 -t utf8 -x nfc	Normalize combining characters
•	printf '\300\200' \| iconv -futf8 -tutf8 >/dev/null	Validate UTF-8
•	printf 'ŨTF8\n' \| LANG=C grep --color=always '[^ -~]\+'	Highlight non printable ASCII chars in UTF-8
•	fc-match -s "sans:lang=zh"	List font match order for language and style
Development
•	gcc -march=native -E -v -&1\|sed -n 's/.*-mar/-mar/p'	Show autodetected gcc tuning params. See also gcccpuopt
•	for i in $(seq 4); do { [ $i = 1 ] && wget http://url.ie/6lko -qO-\|\| ./a.out; } \| tee /dev/tty \| gcc -xc - 2>/dev/null; done	Compile and execute C code from stdin
•	cpp -dM /dev/null	Show all predefined macros
•	echo "#include " \| cpp -dN \| grep "#define __USE_"	Show all glibc feature macros
	gdb -tui	Debug showing source code context in separate windows
Extended Attributes (Note you may need to (re)mount with "acl" or "user_xattr" options)
•	getfacl .	Show ACLs for file
•	setfacl -m u:nobody:r .	Allow a specific user to read file
•	setfacl -x u:nobody .	Delete a specific user's rights to file
	setfacl --default -m group:users:rw- dir/	Set umask for a for a specific dir
	getcap file	Show capabilities for a program
	setcap cap_net_raw+ep your_gtk_prog	Allow gtk program raw access to network
•	stat -c%C .	Show SELinux context for file
	chcon ... file	Set SELinux context for file (see also restorecon)
•	getfattr -m- -d .	Show all extended attributes (includes selinux,acls,...)
•	setfattr -n "user.foo" -v "bar" .	Set arbitrary user attributes
BASH specific
•	echo 123 \| tee >(tr 1 a) \| tr 1 b	Split data to 2 commands (using process substitution)
	meld local_file <(ssh host cat remote_file)	Compare a local and remote file (using process substitution)
Multicore
•	taskset -c 0 nproc	Restrict a command to certain processors
•	find -type f -print0 \| xargs -r0 -P$(nproc) -n10 md5sum	Process files in parallel over available processors
	sort -m <(sort data1) <(sort data2) >data.sorted	Sort separate data files over 2 processors

CSC204, Linux Basic

root Definition

http://www.linfo.org/root.html

root is the user name or account that by default has access to all commands and files on a Linux or other Unix-like operating system. It is also referred to as the root account, root user and the superuser.

The word root also has several additional, related meanings when used as part of other terms, and thus it can be a source of confusion to people new to Unix-like systems.

One of these is the root directory, which is the top level directory on a system. That is, it is the directory in which all other directories, including their subdirectories, and files reside. The root directory is designated by a forward slash ( / ).

Another is /root (pronounced slash root), which is the root user's home directory. A home directory is the primary repository of a user's files, including that user's configuration files, and it is usually the directory in which a user finds itself when it logs into a system. /root is a subdirectory of the root directory, as indicated by the forward slash that begins its name, and should not to be confused with that directory. Home directories for users other than root are by default created in the /home directory, which is another standard subdirectory of the root directory.

Root privileges are the powers that the root account has on the system. The root account is the most privileged on the system and has absolute power over it (i.e., complete access to all files and commands). Among root's powers are the ability to modify the system in any way desired and to grant and revoke access permissions (i.e., the ability to read, modify and execute specific files and directories) for other users, including any of those that are by default reserved for root.

A rootkit is a set of software tools secretly installed by an intruder into a computer that allows such intruder to use that computer for its own, usually nefarious, purposes when desired. Well designed rootkits are able to obtain root access (i.e., access to the root account rather than just to a user account) and to hide most or all traces of their presence and activities.

The use of the term root for the all-powerful administrative user may have arisen from the fact that root is the only account having write permissions (i.e., permission to modify files) in the root directory. The root directory, in turn, takes its name from the fact that the filesystems (i.e., the entire hierarchy of directories that is used to organize files) in Unix-like operating systems have been designed with a tree-like (although inverted) structure in which all directories branch off from a single directory that is analogous to the root of a tree.

The original UNIX operating system, on which Linux and other Unix-like systems are based, was designed from the very beginning as a multi-user system because personal computers did not yet exist and each user was connected to the mainframe computer (i.e., a large, centralized computer) via a dumb (i.e., very simple) terminal. Thus it was necessary to have a mechanism for separating and protecting the files of the individual users while allowing them to use the system simultaneously. It was also necessary to have a means for enabling a system administrator to perform such tasks as entering user directories and files to correct individual problems, granting and revoking powers for ordinary users, and accessing critical system files to repair or upgrade the system.

Every user account is automatically assigned an identification number, the UID (i.e., user ID), by a Unix-like system, and the system uses these numbers instead of the user names to identify and keep track of the users. Root always has a UID of zero. This can be verified by logging in as root (if using a home computer or other system that permits this operation) and running the echo command to display the UID of the current user, i.e.,

echo $UID

echo is used to repeat on the screen what is typed in after it. The dollar sign preceding UID tells echo to display its value rather than its name.

The UID for root (as well as for all other users) can also be seen by looking at /etc/passwd, which is the configuration file for user data. This file can be viewed (by default by all users) by using the cat command (which is commonly employed to read files), i.e.,

cat /etc/passwd | less

The output of cat /etc/passwd in this example is piped (i.e., transferred) to the less command to allow it to be read one screenful at a time, which is useful if the file is a long one. The line of output for root will look something like root:x:0:0:root:/root:/bin/bash. The first column shows the user name and the third column shows the UID, which can be seen to be zero.

The permissions system in Unix-like operating systems is set by default to prevent access by ordinary users to critical parts of the system and to files and directories belonging to other users. Thus, it can be very tempting for users new to such systems, especially those who are accustomed to systems with a weak permissions system or without any permissions system (e.g., Microsoft Windows or the older versions of the Macintosh), to bypass this permissions system on their personal computers by logging directly into the root account and staying there. Although this provides momentary relief, it should be avoided and ordinary work on the system should be done via an ordinary user account.

This is because it is very easy to damage a Unix-like system when using it as root -- much easier than to damage most other types of operating systems. The designers of most other operating systems devised methods of protecting the system and data to compensate for the lack of a robust permissions system.

However, an important principle of Unix-like operating systems is the provision of maximum flexibility to configure the system, and thus the root user is fully empowered. Unix-like systems assume that the system administrator knows exactly what he or she is doing and that only such individual(s) will be using the root account. Thus, there is virtually no safety net for the root user in the event of a careless error, such as damaging or deleting a critical system file (which could make the entire system inoperable).

Adding to the danger of routinely using the system as root is the fact that all processes (i.e., instances of programs in execution) started by the root user have root privileges. Because even the most widely used and well-tested application programs contain numerous programming errors (due to the huge amount of code required and its great complexity), a skilled attacker can often find and exploit such an error to obtain control of a system when a program is run with root privileges rather than using an ordinary user account, with its very limited privileges.

A critical means for preventing users from directly damaging Unix-like systems or increasing the vulnerability of such systems to damage by others is the avoidance of using the root account except when absolutely necessary, even by knowledgeable and experienced system administrators. That is, rather than routinely logging into the system as root, administrators should log in with their ordinary user accounts and then use commands, such as su, kdesu and sudo, that provide them with root privileges only as needed and without requiring a new login.

For example, to become root with su merely requires typing

su

at the command line (i.e., in the all-text mode), pressing the Enter key and supplying the root password. The account of the previous user can be returned to by pressing the Ctrl and d keys simultaneously or by typing the word exit and then pressing the Enter key.

The security associated with using su can be increased by using its -c option, which terminates it and causes an immediate return to the former user account after the current command has completed execution or after any program that it has launched has been closed.

Tasks that require root privileges include moving files or directories into or out of system directories (i.e., directories that are critical to the functioning of the operating system), copying files into system directories, granting or revoking user privileges, some system repairs, and the installation of some application programs. By default, it is not necessary to be root to be able to read most configuration files and documentation files in system directories, although it is necessary to be root to modify them.

Root privileges are usually required for installing software in RPM (Red Hat Package Manager) package format because of the need to write to system directories. If an application program is being compiled (i.e., converted into runnable form) from source code (i.e., its original, human-readable form), however, it can usually be configured to install and run from a user's home directory. Root privileges are not needed by an ordinary user to compile and install software in its home directory. Compiling software as root should be avoided for security reasons.

On large systems used by businesses and other organizations, there will likely be several system administrators. Each will have its own account in which it will ordinarily work (and the activities of which will be automatically recorded in system logs for security and repair purposes) but will also have access to the root account for use when necessary. The system administrator(s) might grant limited root privileges to some individuals, such as assistant administrators.

CSC204, Linux Basic

Hard Link Definition

http://www.linfo.org/hard_link.html

A hard link is merely an additional name for an existing file on Linux or other Unix-like operating systems.

Any number of hard links, and thus any number of names, can be created for any file. Hard links can also be created to other hard links. However, they cannot be created for directories, and they cannot cross filesystem boundaries or span across partitions.

The operating system makes no distinction between the name that was originally assigned to a file when it was first created and any hard links that are subsequently created to that file other than that they are merely multiple names for the same file. This is because the original name and any hard links all point to the same inode. An inode is a data structure (i.e., an optimized way of storing information) that stores all the information about a file (e.g., its size, its access permissions, when it was created and where it is located on the system) except its name(s) and its actual data. The fact that inode numbers are unique only within any filesystem is the reason that they do not work across filesystems and partitions.

Hard links are created with the ln command. For example, the following would create a hard link named hlink1 to a file named file1, both in the current directory (i.e., the directory in which the user is currently working):

ln file1 hlink1

When a hard link is created, there is no obvious indication that it is any different from any other file. That is, hard links appear to be files of the same type as their target files (i.e., the files to which they are linked) when they are viewed with commands such as ls (i.e., list) and file (which is used to determine the type of any specified files). Likewise, when viewed in a GUI (graphical user interface), the icons for hard links are identical to those for their target files.

That the initial name of a file and all hard links to that file all share the same inode can be clearly seen by using the ls command with its -i (i.e., inode) option. Thus, for example, the following would show that the inode numbers of file1 and hlink1 from the above example are identical:

ls -i file1 hlink1

The number of hard links to any file is shown in the second column of output produced by using ls with its -l (i.e., long) option. It can be seen that the number is the sum of the target file and any hard links to it (i.e., the sum of the initial name and any subsequently added names) and that it is the same for the target and for each such link.

Hard linked files can also be found by using the find command with its -type f option (to select only regular files) followed by its -links +1 option (to show all regular files with more than one hard link to them) as follows:

find -type f -links +1

When a change is made to the contents of a file, the linkage to all of the hard links is preserved. However, some text editors may break the link by creating a new inode for the revised contents,¹ and thus it can be prudent to check important links after modifying files.

The rm command superficially appears to remove or delete files. What it really does, however, is to reduce a file's hard link count (i.e., the number of names the file has) by one, and it does not directly affect the inode or the file's data. When the count reaches zero, the file appears to have vanished because there is no longer any easy way to reference it. However, the file's data is only truly deleted when the location(s) on the hard disk drive (HDD) or other storage media that contains it is overwritten by a new file.

Thus, for example, the following would remove the hard link hlink1 that was created in the above example:

rm hlink1

Using rm again with the one remaining name as follows would then make the file's data virtually inaccessible:

rm file1

Perhaps the most useful application for hard links is to allow files, programs and scripts (i.e. short programs) to be easily accessed in a different directory from the original file or executable file (i.e., the ready-to-run version of a program). Typing the name of the hard link will cause the program or script to be executed in the same way as using its original name.

Symbolic links, also called soft links, are more useful than hard links because they can be made to directories as well as to files on different filesystems and on different partitions. Moreover, when using a GUI, symbolic links have special icons that immediately identify them as being links rather than ordinary files. However, they have the disadvantage that they become unusable if their target file is deleted.

Aliases superficially resemble hard links in that they are another way of providing multiple names for any file. However, the alias command is built into the shell (i.e., the program that provides the text-only user interface) rather than being a separate program and the mechanism is very different from that of hard links. Like symbolic links, aliases can be used not only for files but also for directories and can cross filesystem and partition boundaries. In addition, an alias can be used as a short name for any shell text (i.e., a command or series of linked commands, inclusive of tbeir options and/or arguments).

________
¹Tests on Red Hat Linux 9 found that hard links were broken when modifying files using the gedit text editor. However, they were not broken when using the vi and Abiword text editors as well as the KHexEdit hex editor on the same version of Linux. The failure of gedit to preserve hard links was due to the fact that it actually creates a copy of the modified file that it saves (and thus the new inode number) rather than making the changes to the original file, but this copy is given the name of the original file. However, a similar test using a newer version of gedit (2.14.0) on Fedora Core 5 showed that the problem had been corrected and that there was no breakage of links.

CSC204, Linux Basic

Characters: A Brief Introduction

http://www.linfo.org/character.html

Characters are the basic symbols that are used to write or print a language. For example, the characters used by the English language consist of the letters of the alphabet, numerals, punctuation marks and a variety of symbols (e.g., the ampersand, the dollar sign and the arithmetic symbols).

Characters are fundamental to computer systems. They are used for (1) input (e.g., through the keyboard or through optical scanning) and output (e.g., on the screen or on printed pages), (2) writing programs in programming languages, (3) as the basis of some operating systems (such as Linux) which are largely collections of plain text (i.e., human-readable character) files and (4) for the storage and transmission of non-character data (e.g., the transmission of images by e-mail using base64).

Issues regarding characters and their use with computers are relatively simple if dealing with a single language, such as English, which has a small number of characters. However, they become quite complex when dealing with internationalization and localization because of the diverse array of writing systems and vast number of characters in use throughout the world. Internationalization is the addition of a framework for support for multiple languages and cultures; localization is the adjustment of language, content and design to specific countries, regions or cultures.

Character Sets

A character set is the collection of characters that is used to write a particular language. Most languages have a single character set, and similar character sets are often used by a number of languages (e.g., variants of the Roman alphabet are used to write English, Spanish, Finnish, Dutch, etc.).

A few languages have, or have had, more than one character set. For example, the Japanese language uses three character sets: the main one is Chinese characters (i.e., the characters that are used to write the Chinese language), but it is supplemented with two syllabaries (called hiragana and katakana). The Korean language is now written mainly with a unique alphabet (called Hangul), but Chinese characters are still occasionally used.

Mongolia is attempting to restore its traditional alphabet that was replaced by the Cyrillic alphabet (used to write Russian) in 1937 as a result of the country's being incorporated into the Soviet Union, and thus both character sets are currently in use. Turkey used an Arabic alphabet until 1928, at which time it was replaced by an alphabet based on the Roman alphabet as part of a political decision to become more westernized.

Characters and Glyphs

Characters should not confused with glyphs (although they sometimes are). A glyph is a visual representation (i.e., appearance) of a character and is determined by the typeface and style in which the character is printed. In general, any character can have a number of glyphs, with the number depending on the language.

A typeface is a specific, coordinated design for the entire set of characters that is used to write a language or languages. Some typefaces are available in several styles, such as most of those used to write English and other Western European languages, which are usually available in plain, bold and italic.

Different writing systems use different typefaces, and the number of typefaces varies according to the writing system and language. Thousands of typefaces have been developed for use by English and other Western European languages, and they range all the way from the very simple sans serif Geneva and Courier (which was widely used for typewriters) to Times (which is frequently used in printing periodicals and books) to the highly ornate Gothic (which is used mainly for decorative purposes). Characters written in sans serif typefaces lack the little hooks on their ends that are widely believed to make them easier to read.

Some characters in some languages can look very different according to the combination of typeface and style that are used to write them, and in some cases they may closely resemble other characters. Yet, it is only the glyph of a character that resembles another character, and the character itself (including its meaning and usage) is distinct.

Classification of Characters

Most writing systems can be broadly classified into one of three categories: alphabetic, syllabic and logographic. The vast majority of written languages that exist today use alphabets.

An alphabet is the complete, ordered, standardized set of letters that is used to write or print a written language. Each letter represents one or more phonemes (i.e., the fundamental sounds of a spoken language) and/or is used in combination with other letters to represent a phoneme. Most alphabets in use today are based on the Roman alphabet, which was used by the ancient Romans to write their Latin language.

A syllabary is a set of characters that represent the syllables of a language, with one distinct character for each possible syllable. A syllable is the next largest unit of sound in a language after a phoneme; it consists of a vowel sound or a vowel-consonant combination. Syllabaries typically contain many more characters than do alphabets. They are best suited to languages with relatively simple syllable structures, such as Japanese, which has only about a hundred syllables. The English language, in contrast, contains a relatively large number of vowels and complex consonant clusters, resulting in thousands of syllables.

The third major type of writing system, logographic, uses characters that represent objects or abstract ideas. This type of writing system is popularly referred to as pictographic or ideographic. The most important modern logographic writing system by far is Chinese, whose characters are also used, with varying degrees of modification, in Japanese and Korean (as a supplement to Hangul). The ancient Sumerians, Egyptians and Mayans also used logographic systems.

These three categories are not rigid. For example, the Chinese writing system is not purely logographic. This is because individual characters are often compounds which consist of an element that represents the meaning and an element that represents the pronunciation. Also, combinations of characters are sometimes used mainly for their phonetic values to represent proper nouns (e.g., names of people or places) from other languages.

Likewise, alphabetic and syllabic scripts frequently make some use of logograms and logographic values. The most common example is Arabic numerals, each of which has the same meaning regardless of which language or dialect it is used in and how it is pronounced. Other examples are symbols such as the ampersand and dollar sign. Also, individual letters sometimes have more than just a phonetic value: for example, in the English language the letter A often indicates high quality and the letter X sometimes indicates the unknown or an adult rating.

Origin of Characters

The oldest known writing system is cuneiform (named after the wedge-like shapes of the characters that were formed in clay tablets with reed styluses), which emerged in Sumer (in the southern part of what is now Iraq) more than 5,000 years ago. It was followed closely by the development of writing in Egypt and the Indus valley (in western India).

Chinese characters were apparently invented independently of characters used in the Middle East. They first appeared more than three thousand years ago, and they have been in use continuously in basically the same form ever since.

Most scholars believe that the first alphabets originated in the Near East, perhaps evolving from, or at least being influenced by, cuneiform or Egyptian hieroglyphics. The first widely used alphabet appears to have been that of the Phoenicians (who originated in what is now Lebanon), which was in use by at least 1,200 BC. That alphabet contained 22 letters for consonant sounds and had no letters for vowels (as is the case with the Hebrew and Arabic alphabets, which descended from it). The Phoenicians spread their alphabet around the Mediterranean, including to the Greeks and the Etruscans (who preceded the Romans in Italy).

The Roman alphabet was adapted mainly from the Etruscan alphabet during the 7th century BC. It had only upper case (i.e., capital) letters and there were no punctuation marks nor spaces between words. Numbers were written with seven letters of the alphabet (i.e., Roman numerals) rather than with Arabic numerals.

Arabic numerals are today by far the most commonly used characters to represent numbers, although there are also other systems for writing numerals that are still in use, including Chinese and Thai. Arabic numerals were originally derived from an Indian system of writing numerals, and there is some speculation that the Indian numerals, in turn, originally came from Chinese characters.

Characters were also invented apparently independently in the Americas. In particular, the Mayans had a highly developed writing system that contained a large number of complex, logographic characters.

Numbers of Characters

The size of a character set varies wildly according to the language. Languages written with alphabets usually have the fewest characters and those using logographic writing systems have the most. Among the former, the language with the smallest alphabet (and thus the smallest total number of characters) is the Rotokas language (spoken in Bougainville, an island to the East of Papua New Guinea), which contains only eleven letters, and that with the largest alphabet is Armenian, with 39 letters.

The Chinese language has by far the largest number of characters of any writing system that has ever existed, and it accounts for the vast bulk of the characters in use in the world today. Chinese contains more than 40,000 characters, and some estimates place the total at close to 60,000. However, most of these are rarely used, and well-educated people generally know only about 5,000.

The Japanese language ranks second in terms of the number of characters because it makes heavy use of Chinese characters. Approximately 2000 such characters are taught during primary and secondary school, and a well-educated person will know at least 3500 characters. Hiragana and katakana, the two syllabaries that are used to supplement the Chinese characters, each contain 46 characters.

In South Korea, middle and high school students study 1,800 to 2,000 Chinese characters, but most people use Hangul almost exclusively in their daily lives. Chinese characters are used mainly for personal and place names, for calligraphy and for clarification of some terms written in Hangul.

Characters and Computers

The vast number of characters and the great diversity of writing systems in use around the world present some major challenges for the development of software. This has become an increasingly important issue as a result of the rapid growth in the use of computers in countries that do not use European languages.

ASCII (an acronym for American Standard Code for Information Interchange and pronounced ask-ee) is the de facto encoding (i.e., set of code numbers) used by computers and communications equipment to represent text. It is a single byte (i.e., eight bits) encoding system (i.e., uses one byte to represent each character), and the use of the first seven bits allows it to represent a maximum of 128 characters. ASCII is based on the characters used to write the English language (including both upper and lower case letters). Extended versions (which utilize the eighth bit to provide a maximum of 256 characters) have been developed for use with other character sets.

Although ASCII is one of the most successful software standards ever developed, its limitations have become increasingly apparent as a result of the growing internationalization and localization of software. It is suitable for use only with languages that have very small character sets, and is not well suited for computer systems which simultaneously use multiple character sets.

Consequently, Unicode was developed as a means of allowing computers to deal with the full range of characters used by human languages. It has a goal of providing a unique encoding for every character that currently exists or that has ever existed (but not for their variant glyphs). This is accomplished by representing each character with two or more bytes, thus vastly increasing the total number of possible unique character encodings. Unicode version 2.0 (released in 1996) listed 38,885 characters, version 3.0 (released in 2000) listed 49,194 and version 4.0 (released in 2003) lists 96,382. Although Unicode has achieved considerable success, it remains a work in process.

A number of issues with regard to the use of characters and writing systems by computers have yet to be completely resolved. They include (1) controversies in the case of some Chinese characters regarding what is the underlying character and what is the variant glyph, (2) efficient keyboard input systems for languages that use large numbers of characters, (3) software that will allow easy input and display of characters that are arranged other than horizontally from left to right (e.g., right to left or vertically), (4) political and nationalistic controversies about characters, (5) characters that can have multiple forms according to where they are used in words and (6) languages that use multiple character sets.

CSC204, Linux Basic

Byte Definition

http://www.linfo.org/byte.html

A byte (represented by the upper-case letter B), is a contiguous sequence of a fixed number of bits that is used as a unit of memory, storage and instructions execution in computers.

A bit (represented by a lower case b) is the most basic unit of information in computing and communications. Every bit has a value of either zero or one. Although computers usually provide ways to test and manipulate single bits, they are almost always designed to store data and execute instructions in terms of bytes.

The number of bits in a byte varied according to the model of computer and its operating system in the early days of computing. For example, the PDP-7, for which the first version of UNIX was written, had 18-bit bytes. Today, however, a byte virtually always consists of eight bits.

Whereas a bit can have only one of two values, an eight-bit byte (also referred to as an octet) can have any of 256 possible values, because there are 256 possible permutations (i.e., combinations of zero and one) for eight successive bits (i.e., 2⁸). Thus, an eight-bit byte can represent any unsigned integer from zero through 255 or any signed integer from -128 to 127. It can also represent any character (i.e., letter, number, punctuation mark or symbol) in a seven-bit or eight-bit character encoding system, such as ASCII (the default character coding used on most computers).

Multiple bytes are used to represent larger numbers and to represent characters from larger character sets. For example, two bytes (i.e., 16-bits) can store any one of 65,536 (i.e., 2¹⁶) possible values, that is, the unsigned integers between 0 and 65,535 or signed numbers from -32,768 to 32,767. Likewise, the range of integer values that can be stored in 32 bits is 0 through 4,294,967,295, or -2,147,483,648 through 2,147,483,647.

A maximum of 32 bits is required to represent a character encoded in Unicode, which is an attempt to provide a unique encoding (i.e., identification number) for every character currently or historically used by the world's languages. However, the majority of the world's languages only need a single-byte character encoding because they use alphabetic scripts, which generally have fewer than 256 characters.

The word byte can also refer to a datatype (i.e., category of data) in certain programming languages and database systems. The C programming language, for example, defines byte to be synonymous with the unsigned char datatype, which is an integer datatype capable of holding at least 256 different values.

Kilobytes, Megabytes, Gigabytes, Terabytes, Petabytes

Because bytes represent a very small amount of data, for convenience they are commonly referred to in multiples, particularly kilobytes (represented by the upper-case letters KB or just K), megabytes (represented by the upper-case letters MB or just M) and gigabytes (represented by the upper-case letters GB or just G).

A kilobyte is 1,024 bytes, although it is often used loosely as a synonym for 1,000 bytes. A megabyte is 1,048,576 bytes, but it is frequently used as a synonym for one million bytes. For example, a computer that has a 256MB main memory can store approximately 256 million bytes (or characters) in memory at one time. A gigabyte is equal to 1,024 megabytes.

One terabyte (TB) is equal to 1024 gigabytes or roughly one trillion bytes. One petabyte is equal to a 1024 terabytes or about a million gigabytes. Some supercomputers now have a petabyte hard disk drive (HDD) capacity and a multipetabyte tape storage capacity. The prefix peta is an alteration of penta, the Greek word for five.

An exabyte is 1024 times larger than a petabyte. The prefix exa is an alteration of hexa, the Greek word for six. As of 2005, exabytes of data are rarely encountered in a practical context. For example the total amount of printed material in the world is estimated to be around a fifth of an exabyte. However, the total amount of digital data that is now created, captured and replicated worldwide might be several hundred exabytes per year.

Origins

The term byte was coined by Werner Buchholz, a researcher at IBM, in 1956 during the early design phase for the IBM Stretch, the company's first supercomputer. It was a modification of the word bite that was intended to avoid accidentally misspelling it as bit. In 1962 Buchholz described a byte as "a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units."

Byte is also sometimes considered a contraction of BinarY digiT Eight. IBM used to teach that a Binary Yoked Transfer Element (BYTE) was formed by a series of bits joined together "like so many yoked oxen." Binary refers to the fact that computers perform all their computations with the base 2 numbering system (i.e., only zeros and ones), in contrast to the decimal system (i.e., base 10), which is commonly used by humans.

The movement toward an eight-bit byte began in late 1956. A major reason that eight was considered the optimal number was that seven bits can define 128 characters (as against only 64 characters for six bits), which is sufficient for the approximately 100 unique codes needed for the upper and lower case letters of the English alphabet as well as punctuation marks and special characters, and the eighth bit could be used as a parity check (i.e., to confirm the accuracy of the other bits).

This size was later adopted by IBM's highly popular System/360 series of mainframe computers, which was announced in April 1964, and this was a key factor in its eventually becoming the industry-wide standard.

If computers were used for nothing other than binary calculations, as some once were, there would be no need for bytes. However, because they are extensively used to manipulate character-based information, it is necessary to have encodings for those symbols, and thus bytes are necessary.

CSC204, Linux Basic

PDP-7 Definition

http://www.linfo.org/pdp-7.html

The PDP-7 was a minicomputer which was shipped by Digital Equipment Corporation (DEC) in 1965. Its greatest claim to fame by far is that it is the computer for which the first version of UNIX was created.

Minicomputers were third generation computers that made efficient use of discrete transistors and magnetic core memories (i.e., arrays of tiny rings made from a magnetic ceramic material) in place of vacuum tubes to reduce their size, purchase price and operating cost to only small fractions of those for the mainframe computers which still dominated high performance computing.

Minicomputers also bridged the huge performance gap between the high capacity mainframes and the low powered microcomputers. The latter were relatively simple, single-user machines that ran simple operating systems such as CP/M or MS-DOS. Minicomputers, in contrast, ran full multi-user, multitasking operating systems such as VMS (developed by DEC) and UNIX.

A multitasking operating system is one in which multiple processes can execute (i.e., run) on a single computer seemingly simultaneously and without interfering with each other. A process, also referred to as a task, is a running instance of a program.

Established in 1957 by three graduates of the Massachusetts Institute of Technology (MIT) and initially operated out of an old wool mill in Maynard, Massachusetts, DEC was a pioneer in the U.S. computer industry. At its peak in 1990 it employed more than 120,000 people worldwide and earned more than $14 billion in revenue. The company was acquired in 1998 by Compaq Computer Corporation, which subsequently merged with Hewlett-Packard in 2001.

The PDP Series

During the 1960s DEC introduced its influential PDP series of minicomputers that featured magnetic core memories and a variety of other advanced technologies. A key to the success of this series was its suitability for that large market segment that could not afford mainframes.

PDP was an abbreviation for Programmed Data Processor. The company did not want its machines to be called computers because a study had predicted that the world market for computers would be very small, perhaps less than a hundred. It was the conventional wisdom of the time (even among the government and DEC's stockholders) that computers were big and expensive and required a dedicated computer center and a large supporting staff. DEC chose to avoid dealing with these stereotypes by entirely avoiding the term computer.

DEC's first computer, the PDP-1, entered production in 1960. It was priced at only $120,000 for a basic system at a time when other computers typically sold for well in excess of a million dollars. It also featured low operating costs and ease of use, including the ability to be operated by a single person and the building in of a CRT (cathode ray tube) display on which images could be drawn using an accompanying light pen.

The PDP-1 had a time-sharing operating system and a magnetic core memory that held 4096 words of 18 bits each. Memory capacity could be expanded in increments of 4096 words to a maximum of 65,536 words, and it could be supplemented by up to 24 magnetic tape drive storage units. Driving currents were automatically adjusted to compensate for temperature variations between 50 and 110 degrees Fahrenheit.

The PDP-1 is perhaps best remembered today for being the computer most important in the creation of the early hacker (i.e., computer expert) culture at MIT and elsewhere. It was also significant in that in 1962 it became the first computer used for playing a computer game, Steve Russell's Spacewar.

The PDP-1 was followed by a succession of models with a wide range of prices and performance levels. The most powerful of these were fully worthy of the large computer centers with big support staffs that were required by mainframes. Some early models, such as the PDP-3, were not actually built by DEC itself but rather by customers using DEC parts and facilities.

The PDP-4, DEC's second 18-bit model, was introduced in 1963 as a cheaper, but slower, alternative to the PDP-1. It was not commercially successful, with only about 54 units being sold. However, all of the company's subsequent 18-bit PDP models were based on its simplified instructions set (i.e., the set of commands that the computer's processor can understand and execute).

Among the more notable of the PDP-1's successors was the PDP-6, which was shipped in 1964. It was a large, high capacity model and sported a 36-bit word size. The price was the same as for the PDP-1, and approximately 23 units were built.

The PDP-7

The PDP-7, which was introduced in 1965, was developed as a less expensive alternative to the PDP-4, with a price of only U.S.$72,000 for a minimal system. It also had an 18-bit word length, and its standard main memory was 4K words (equivalent to nine kilobytes) but upgradeable to 64K words (144 KB). Minuscule by today's standards, this amount of RAM (random access memory) core memory was considered substantial at the time, especially given the low price of the system.

The PDP-7's CPU (central processing unit) was implemented using a large number of small circuit cards. As was the case with DEC's earlier models, all logic was formed from discrete components (i.e, individual transistors, diodes and resistors); that is, no integrated circuits (ICs) were used.

The PDP-7 also featured DEC's first mass storage-based operating system. This was made possible by the company's new DECtape random access, block addressable, small format magnetic tape system. For the first time, tape was divided into sectors so that it could be used as an input/output storage system that was both interactive and inexpensive. The tapes were used in a way similar to how floppy disks were later used.

Like its predecessors, input and output was conducted via a teletypewriter, also referred to as a teletype machine, a kind of electromechanical typewriter that was commonly used for communication, and a punched paper tape drive unit for low cost storage of programs and data was included. There was also a high quality DEC 340 CRT display unit with a round, ten-inch screen which could draw simple vector graphics, and an accompanying light pen could be used to draw on the display screen.

The system also included an advanced Fortran II compiler, a symbolic assembler, a text editor, a debugging system, maintenance routines and a library of arithmetic, utility and programming aids that had been developed on the PDP-4. Fortran is a programming language that was developed in the 1950s and which is still widely used for scientific and numerical applications. A compiler is a specialized program for converting source code into machine code that a CPU can directly understand and execute. An assembler is a computer program for translating an assembly language into machine code. Source code is the original form in which software is written in a programming language prior to being compiled, and assembly languages are a type of low level (i.e., very close to machine code but easier for humans to read and write) programming language.

The PDP-7 was well received in data acquisition and laboratory applications, and it was considered sufficiently reliable (at least when properly programmed) to be suitable even for use in the control of nuclear reactors. Ultimately, 120 of the systems were produced and sold.

The PDP-7 and the Birth of UNIX

In 1969 Ken Thompson wrote the first version of UNIX in assembly language using an otherwise little-used PDP-7 at Bell Labs, the research arm of AT&T, the former U.S. telecommunications monopoly. One of the factors that made this possible was the proficiency that he had gained with that system while writing an early computer game called Space Travel. This game, incidentally, became one of the first programs to run on UNIX.

However, the PDP-7 was already obsolete when it was used for creating the first version of UNIX, and thus in 1970 the UNIX group proposed purchasing a PDP-11 for $65,000. The PDP-11, which had just been launched that year, incorporated some important advances (including greater ease of programming), and it became a highly successful and influential model. It was DEC's first and only 16-bit system.

In 1971, the group used their new PDP-11 to rewrite UNIX in a high-level language, instead of its original assembly language, so that it could more easily be ported to (i.e., transferred to) other types of computers. They briefly tried using Fortran before creating their own language based on BCPL (Basic Combined Programming Language), which they called B. They then extended B to produce the C language, which is still in widespread use today, after which they rewrote the UNIX source code in C. This made it easier to port UNIX to run on new hardware, as all that was needed was a C compiler to convert its C source code into the machine code for the specific type of computer.

The severe limitations of the PDP series and other computers of the day forced Thompson and Ritchie to be ruthlessly efficient in their designs for UNIX and C, as was the case with the other operating systems and languages of that era (although those other systems and languages have long since faded from use). Despite the fact that memory sizes, processor speeds, data access times, storage capacities and display capabilities have grown vastly greater than could have even been imagined at that time, this extreme efficiency has continued to serve Unix-like operating systems (i.e., the descendants and clones of the original UNIX) well and is widely acknowledged to be an important factor in the enduring and growing success of such systems.

The PDP-15, which was shipped in 1970, was DEC's final 18-bit computer, and it was the only one that was implemented with integrated circuits rather than discrete components. It was the largest selling of all the PDP models, with more than 400 units ordered in just the first eight months of production.

There are still a few PDP-7s in operable condition, including one that is currently being restored in Oslo, Norway.

CSC204, Linux Basic

The umount Command

http://www.linfo.org/umount.html

The umount command is used to manually unmount filesystems on Linux and other Unix-like operating systems.

A filesystem in this context is a hierarchy of directories that is located on a single partition (logically independent section of a hard disk drive) or other device, such as a CDROM, DVD, floppy disk or USB key drive, and has a single filesystem type (i.e., method for organizing data).

Mounting refers to logically attaching a filesystem to a specified location on the currently accessible (and thus already mounted) filesystem(s) on a computer system so that its contents can be accessed by users. Unmounting refers to logically detaching a filesystem from the currently accessible filesystem(s).

All mounted filesystems are unmounted automatically when a computer is shut down in an orderly manner. However, there are times when it is necessary to unmount an individual filesystem while a computer is still running. A common example is when it is desired to remove an external device such as a USB key drive; should such device be removed before the filesystem on it is properly unmounted, it is possible that any data recently added to it might not be saved.

The basic syntax of umount is

umount [options] filesystem

umount is most commonly used without any of its several options. The filesystem is identified by the full pathname of the directory in which it has been mounted, not by its type. Thus, for example, to unmount a filesystem that is mounted in a directory called /dir1, all that would be necessary is to type in the following at the keyboard and press the Enter key:

umount /dir1

Likewise, a USB key device, assuming that it had been mounted in the directory /mnt/usb, would be unmounted with the following:

umount /mnt/usb

Attempts to unmount a filesystem are not always successful. The most common problem is that the filesystem is busy. That is, it is currently being used by some process (i.e., instance of a program in execution). In such case an error message such as umount: /dir1: device is busy will be displayed on the screen. This busy state could be the result of something as simple as an GUI window being open that shows an icon of the directory containing the filesystem, in which case it can be easily solved by closing the window. Or it could be the result of a file on that filesystem being open, in which case all that is necessary is to close the file. In less obvious cases, it may be necessary to use a command such as ps or pstree to try to locate the offending process(es) and then use a command such as kill to terminate such process(es).

Another cause of failure is when a user attempts to unmount a filesystem that has already been unmounted. In such case an error message such as umount: /dir1: not mounted will be returned.

In the event that the unmounting is successful, umount usually works silently; that is, there is no message on the screen to confirm its success. However, umount can be made to provide such a message by using the -v (i.e., verbose) option. (This should not be confused with the -V option, which merely returns information about the currently installed version of umount.)

umount allows the name of the physical device on which the filesystem is mounted to be included in the command if desired. This is convenient because it can minimize typing by allowing the user to utilize the upward pointing arrow on the keyboard to display the command that was previously used to mount that filesystem (i.e., to use the history command) and then merely insert the letter u before the word mount and press the Enter key in order to unmount the filesystem. Thus, for example, if a filesystem that is physically located on the second partition of the first HDD (which is designated by dev/hda2) is mounted in a directory called /dir2, it can be unmounted with either of the following:

umount /dir2

umount /dev/hda2 /dir2

Interestingly, when the physical device is included, a confirmation message is automatically supplied.

There are several options that can be tried in the event that umount refuses to unmount a filesystem for no immediately apparent reason. Perhaps the most useful is the -l (i.e., lazy) option, which immediately detaches the filesystem from the main filesystem and then cleans up all references to the unmounted filesystem as soon as it is no longer busy. This capability requires Linux kernel 2.4.11 or later.

Another way to deal with an unmounting failure is to use the -r option, which remounts the filesystem as read-only. This presumably allows devices or media to be removed without affecting data which has just been written to them. In addition, the -f option forces unmounting in the case of an unreachable NFS (network filesystem) filesystem.

The -a option causes all of the filesystems described in /etc/mtab to be unmounted. (However, with umount version 2.7 and later the proc filesystem is not unmounted.) /etc/mtab is a file that is similar to /etc/fstab and which is updated by mount and umount whenever filesystems are mounted or unmounted. The -n option causes unmounting to occur without writing to /etc/mtab.

The -t option followed by the filesystem type indicates that the actions should only be taken on filesystems of that type. Multiple types can be specified in a comma-separated list. This list can be prefixed with the word no to specify filesystem types on which no action should be taken.

The -O options indicate that the actions should only be taken on filesystems with the specified options in /etc/fstab. Multiple option types can be specified in a comma-separated list. Those options for which no action should be taken can be prefixed with no.

umount will free any loop device associated with a mounted filesystem if it finds the option loop=... in /etc/mtab or if the -d option is used. A loop device is a pseudo-device that is able to redirect and transform data that goes through its loop and which is used mainly used for encrypting filesystems.

Note the symmetry between the umount and mount commands, including the fact that many of the options are identical or very similar (including -a, -h, -r, -t, -O, -v and -V). This is consistent with the Unix philosophy, a fundamental component of which is simplicity (and hence consistency to the extent practical among commands), in that it eliminates unnecessary complexity.

umount could have instead been called unmount. This might have simplified things for people who are new to the command line (i.e., text-only operation). However, eliminating unnecessary typing is also a part of the Unix philosophy, and thus the n was not used.

CSC204, Linux Basic

The head Command

http://www.linfo.org/head.html

The head command reads the first few lines of any text given to it as an input and writes them to standard output (which, by default, is the display screen).

head's basic syntax is:

head [options] [file(s)]

The square brackets indicate that the enclosed items are optional. By default, head returns the first ten lines of each file name that is provided to it.

For example, the following will display the first ten lines of the file named aardvark in the current directory (i.e., the directory in which the user is currently working):

head aardvark

If more than one input file is provided, head will return the first ten lines from each file, precede each set of lines by the name of the file and separate each set of lines by one vertical space. The following is an example of using head with two input files:

head aardvark armadillo

If it is desired to obtain some number of lines other than the default ten, the -n option can be used followed by an integer indicating the number of lines desired. For example, the above example could be modified to display the first 15 lines from each file:

head -n15 aardvark armadillo

-n is a very tolerant option. For example, it is not necessary for the integer to directly follow it without a space in between. Thus, the following command would produce the same result:

head -n 15 aardvark armadillo

In fact, the letter n does not even need to be used at all. Just the hyphen and the integer (with no intervening space) are sufficient to tell head how many lines to return. Thus, the following would produce the same result as the above commands:

head -15 aardvark armadillo

head can also return any desired number of bytes (i.e., a sequence of eight bits and usually long enough to represent a single character) from the start of each file rather than a desired number of lines. This is accomplished using the -c option followed by the number of bytes desired. For example, the following would display the first five bytes of each of the two files provided:

head -c 5 aardvark anteater

When head counts by bytes, it also includes the newline character, which is a non-printing (i.e, invisible) character that is designated by a backslash and the letter n (i.e., \n). Thus, for example, if there are three new, blank lines at the start of a file, they will be counted as three characters, along with the printing characters (i.e., characters that are visible on the monitor screen or on paper).

The number of bytes or lines can be followed by a multiplier suffix. That is, adding the letter b directly after the number of bytes multiplies it by 512, k multiplies it by 1024 and m multiplies it by 1048576. Thus, the following command would display the first five kilobytes of the file aardvark:

head -c5k aardvark

The -c option is less tolerant than the -n option. That is, there is no default number of bytes, and thus some integer must be supplied. Also, the letter c cannot be omitted as can the letter n, because in such case head would interpret the hyphen and integer combination as the -n option. Thus, for example, the following would produce an error message something like head: aardvark: invalid number of bytes:

head -c aardvark

If head is used without any options or arguments (i.e., file names), it will await input from the keyboard and will successively repeat (i.e., each line will appear twice) on the monitor screen each of the first ten lines typed on the keyboard. If it were desired to repeat some number of lines other than the default ten, then the -n option would be used followed by the integer representing that number of lines (although, again, it is not necessary to include the letter n), e.g.,

head -n3

As is the case with other command line (i.e., all-text mode) programs in Linux and other Unix-like operating systems, the output from head can redirected from the display monitor to a file or printer using the output redirection operator (which is represented by a rightward-pointing angular bracket). For example, the following would copy the first 12 lines of the file Yuriko to the file December:

head -n 12 Yuriko > December

If the file named December did not yet exist, the redirection operator would create it; if it already existed, the redirection operator would overwrite it. To avoid erasing data on an existing file, the append operator (which is represented by two consecutive rightward pointing angle brackets) could be used to add the output from head to the end of a file with that name if it already existed (or otherwise create a new file with that name), i.e.,

head -n 12 Yuriko >> December

The output from other commands can be sent via a pipe (represented by the vertical bar character) to head to use as its input. For example, the following sends the output from the ls command (which by default lists the names of the files and directories in the current directory) to head, which, in turn, displays the first ten lines of the output that it receives from ls:

ls | head

This output could easily be redirected, for example to the end of a file named file1 as follows:

ls | head >> file1

It could also be piped to one or more filters for additional processing. For example, the sort filter could be used with its -r option to sort the output in reverse alphabetic order prior to appending file1:

ls | head | sort -r >> file1

The -q (i.e., quiet) option causes head to not show the file name before each set of lines in its output and to eliminate the vertical space between each set of lines when there are multiple input sources. Its opposite, the -v (i.e., verbose) option, causes head to provide the file name even if there is just a single input file.

The tail command is similar to the head command except that it reads the final lines in files rather than the first lines.

As is the case with other commands on Unix-like operating systems, additional information can be obtained about head and tail by using the man and info commands to reference the built-in documentation, for example

man head

info tail

CSC204, Linux Basic

Index of Linux Commands

http://www.linfo.org/command_index.html

alias: allows launching of any command or combination of commands by using a preset character or series of characters.

apropos: displays a list of all topics in the built-in user manual that are related to the subject of a query.

bzip2: used for compressing and decompressing files.

cat: (short for concatenate) has three related functions with regard to text files: displaying them, combining copies of them and creating new ones.

cd: changes directories.

clear: removes all previous commands and output from consoles and terminal windows.

cp: copies files and directories.

df: reports the amount of space used and available on currently mounted filesystems.

dmesg: reads the kernel messages.

du: shows the sizes of directories and files.

fdformat: performs low-level formatting of floppy disks.

file: classifies filesystem objects.

free: provides information about unused and used memory and swap space.

grep: searches text.

head: by default reads the first ten lines of text.

hostname: shows or sets a computer's host name and domain name.

kdesu: opens KDE su, the graphical front end for the su command.

kill: terminates stalled processes without having to log out or reboot.

killall: terminates all processes associated with programs whose names are provided to it as arguments.

locate: finds files and directories.

man: formats and displays the built-in manual pages.

mkbootdisk: creates an emergency boot floppy.

mkdir: creates new directories.

mkfs: creates a filesystem on a disk or on a partition thereof.

mv: renames and moves files and directories.

ps: (short for process status) lists the currently running processes and their process identification numbers (PIDs).

pstree: displays the processes on the system in the form of a tree diagram.

pwd: (short for present working directory) displays the full path to the current directory.

reboot: restarts a computer without having to turn the power off and back on.

rm: deletes the specified files and directories.

rmdir: deletes the specified empty directories.

runlevel: reports the current and previous runlevels.

shred: destroys files.

spell: checks spelling.

strings: returns each string of printable characters in files.

su: (short for substitute user) changes a login session's owner without the owner having to first log out of that session.

tail: by default reads the final ten lines of text.

tar: converts a group of files into an archive.

touch: the easiest way to create new, empty files.

tr: translates or deletes characters.

unalias: removes entries from the current user's list of aliases.

uname: provides basic information about a system's software and hardware.

uptime: shows the current time, how long the system has been running since it was booted up, how many user sessions are currently open and the load averages.

w: shows who is logged into the system and what they are doing.

wc: by default counts the number of lines, words and characters that are contained in text.

whatis: provides very brief descriptions of command line programs and other topics related to Unix-like operating systems.

whereis: locates the binary, source code and man page for any specified program.

whoami: returns the user name of the owner of the current login session.

________
The above commands are those that are described in detail by The Linux Information Project. They represent only a fraction of the total number of standard commands typically included in Linux and other Unix-like operating systems. In keeping with the Unix philosophy, most are small, independent, and highly specialized programs.

CSC204, Linux Basic

Linux Resources for Educators

http://www.linfo.org/edu_resources.html

Below are links to some of the best websites about the use of Linux and other open source software in educational institutions:

Authenticated User Community - an intranet system for kindergarten through high school use which provides a uniform web-based interface for discussion forums, e-mail, file management and a searchable user database. Also, "Interactive Classrooms" provide a means for students and teachers to have a web-based extension to their in-class interaction.

The Case for Linux in Universities - a long, single-page article with numerous links.

Discussion on using Linux in education - provides a mailing list, a catalog of free educational software and Linux case studies.

Fossil Lab Home Page - the Free/Open Source Laboratory at Worcester Polytechnic Institute (in Worcester, MA) is funded under an NSF grant. It allows students to run experiments on dedicated machines, do kernel "hacking" and gain valuable system administration experience that is not possible in conventional computer science laboratories.

K12Admin - a system developed in Northern British Columbia, Canada for administering Linux servers in individual kindergarten through 12th grade schools. It allows the staff in each school to maintain their own student and staff accounts while providing a homogeneous network throughout the school district.

K-12 Linux Project - contains three linked sites providing software, tutorials and discussion forums.

LearnLoop - open source groupware being developed to support education and collaboration.

Linux and Education - an article by the Bellevue Linux Users Group that discusses the advantages of using Linux rather than proprietary software in the classroom.

Linux in Higher Education: Open Source, Open Minds, Social Justice - a 2000 article in Linux Journal advocating the adoption of Linux as an international standard for computing in higher education.

The Linux for Schools Project - a single page site (as of March, 2004) that explains techniques for efficiently adding and removing large numbers of user accounts.

Open Administration for Schools - an open source school administration program. It can support multiple schools on single, central server and provides separate, secure websites for use both by the school office and by teachers in the classroom.

The Open Source Education Foundation - a non-profit company devoted to enhancing kindergarten through high school education through the use of technologies and concepts derived from the open source and free software movements.

Open Source Software in American Public Schools - an article by Bill French, a graduate student at the University of California at Berkeley's School of Information Management and Systems.

Penguin Enrolls in U.S. Schools - a 2001 article from Wired.

Open Source Educational Group - information about open source computing for educational and governmental institutions.

Schoolforge News-Journal - provides articles and links as well as tools and materials to create a school and all its parts.

Software Freedom, Open Software and the Undergraduate Computer Science Curriculum - an article by John Howland of the Department of Computer Science, Trinity University in San Antonio, Texas.

Open Source in Education - an informal essay designed to help educators better understand open source software.

Site@School - a program to manage and maintain the website of a primary school without technical knowledge. Pupils can have personal pages on the site and teachers can check them before publication.

SWEEPING INITIATIVE PUTS 80,000 COMPUTERS RUNNING GNOME . . . - a brief article about the installation by the regional government of Extremadura, Spain of 80,000 Linux computers in its schools.

Trinity drinks deeply at learning's open source - a 2001 article by Nathan Cochrane about how Trinity College at Melbourne University (in Melbourne, Australia) discarded its Windows NT network and replaced it with Debian Linux.

Why should open source software be used in schools? - a brief article followed by numerous comments from educators.

CSC204, Linux Basic

Links

Linux Lab Workbook
http://linuxvm.org/present/SHARE103/S9242nfb.pdf

Linux Lab Manuals
http://www.labmanual.org/tiki-custom_home.php

Lab 1: Accessing the Linux Operating System
http://simms-teach.com/2009-spring/docs/cis90/cis90lab01.pdf

linux lab exercises
http://www.freedownloadmanager.org/downloads/linux-lab-exercises-1630994.html

CSC204, Linux Basic

lab manual

http://www.cs.gmu.edu/~astavrou/courses/isa_656_F07/ISA_656_F07_LabManual_w_solution.pdf